Dark Mode
LINEA
One of LINEA’s main advantages is the simplicity with which it can capture non-linear relations. Capturing non-linear relations is fundamental in when applying regression as these are more realistic representations of the real world.
This page covers a basic implementation of the linea library to analyse a time-series. We’ll cover:
We will run a simple model on some fictitious data sourced from Google trends to understand what variables seem to have an impact on the ecommerce variable.
We start by importing linea, some other useful libraries, and some data.
library(linea) # modelling
library(tidyverse) # data manipulation
library(plotly) # visualization
library(DT) # visualization
data_path = 'https://raw.githubusercontent.com/paladinic/data/main/ecomm_data.csv'
data = read_xcsv(file = data_path)
data = data %>%
get_seasonality(date_col_name = 'date',date_type = 'weekly starting')
data %>%
datatable(rownames = NULL,
options = list(scrollX = TRUE))
linea provides a few default transformations meant to capture non-linear relationships in the data:
The linea::decay() function applies a decay by adding to each data point a percentage of the previous. This transformation is meant to capture the impact, over time, of an event. This function only makes sense on time-bound models.
raw_variable = data$online_media
dates = data$date
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(y = decay(raw_variable, decay = 0.5),
x = dates,
name = 'transformed: decay 50%') %>%
add_lines(y = decay(raw_variable, decay = 0.75),
x = dates,
name = 'transformed: decay 75%') %>%
add_lines(y = decay(raw_variable, decay = 0.95),
x = dates,
name = 'transformed: decay 95%') %>%
layout(title = 'decay',
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
The linea::diminish() function applies a negative exponential function:
\[\ 1 - e^{-v/m} \]
or..
\[\ 1- \frac{1}{e^{v/m}} \] Where v is the vector to be transformed and m defines the shape of the transformation. Here is a visualization of the transformation.
raw_variable = data$christmas
dates = data$date
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = diminish(raw_variable, m = 0.3, abs = F),
x = dates,
name = 'transformed: diminish 30%',
yaxis = "y2"
) %>%
layout(title = 'diminish',
yaxis2 = list(overlaying = "y",
showgrid = F,
side = "right"),
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
This transformation can also be visualized by placing the raw and transformed variable on the horizontal and vertical axis.
plot_ly() %>%
add_lines(
x = raw_variable,
y = diminish(raw_variable,.25,F),
name = 'diminish 25%',
line = list(shape = "spline")
) %>%
add_lines(
x = raw_variable,
y = diminish(raw_variable,.5,F),
name = 'diminish 50%',
line = list(shape = "spline")
) %>%
add_lines(
x = raw_variable,
y = diminish(raw_variable,.75,F),
name = 'diminish 75%',
line = list(shape = "spline")
) %>%
layout(title = 'raw vs. diminished',
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
plot_ly() %>%
add_trace(
x = raw_variable,
y = diminish(raw_variable,.5,F)
) %>%
layout(title = 'raw vs. diminished (m = 10%)',
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
The linea::hill_function() function applies a similar transformation to linea::diminish() as it captures diminishing returns. The function requires for more inputs though, and allows to generate a s-curve.
\[\ b - \frac{b*k^m}{k^m + v^m}\]
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = hill_function(raw_variable, m = 5,k = 25),
x = dates,
name = 'transformed: hill_function m = 5,k = 25',
yaxis = "y2"
) %>%
layout(title = 'diminish',
yaxis2 = list(overlaying = "y",
showgrid = F,
side = "right"),
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
This transformation can also be visualized by placing the raw and transformed variable on the horizontal and vertical axis.
plot_ly() %>%
add_trace(
x = raw_variable,
y = hill_function(raw_variable,m = 5,k = 25)
) %>%
layout(title = 'raw vs. hill_function m = 5,k = 25',
xaxis = list(showgrid = F),
plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)")
The linea::lag() function applies a lag to the data. This transformation is meant to capture relationships that are lagged in time. This function only makes sense on time-bound models.
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = linea::lag(raw_variable, l = 5),
x = dates,
name = 'transformed: lag 5',
) %>%
add_lines(
y = linea::lag(raw_variable, l = 10),
x = dates,
name = 'transformed: lag 10',
) %>%
add_lines(
y = linea::lag(raw_variable, l = 20),
x = dates,
name = 'transformed: lag 20',
) %>%
layout(plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)",
title = 'lag',
xaxis = list(showgrid = F))
The linea::ma() function applies a moving average to the data. This transformation is meant to capture relationships that are smoothed over time. This function only makes sense on time-bound models.
plot_ly() %>%
add_lines(y = raw_variable, x = dates, name = 'raw') %>%
add_lines(
y = ma(raw_variable, width = 5),
x = dates,
name = 'transformed: ma 5',
) %>%
add_lines(
y = ma(raw_variable, width = 15),
x = dates,
name = 'transformed: ma 15',
) %>%
add_lines(
y = ma(raw_variable, width = 25),
x = dates,
name = 'transformed: ma 25',
) %>%
add_lines(
y = ma(raw_variable, width = 25,align = 'left'),
x = dates,
name = 'transformed: lag 25 left',
) %>%
add_lines(
y = ma(raw_variable, width = 25,align = 'right'),
x = dates,
name = 'transformed: lag 25 right',
) %>%
layout(plot_bgcolor = "rgba(0, 0, 0, 0)",
paper_bgcolor = "rgba(0, 0, 0, 0)",
xaxis = list(showgrid = F),
title='ma')
linea can capture non-linear relationships by applying transformations to the raw data, and then generating the regression for the transformed data. This can be accomplished using a model table which specifies each variable’s transformation parameters. The function linea::build_model_table() can be used to generate the blank model table.
ivs = c('covid','christmas','trend')
model_table = build_model_table(ivs = ivs)
model_table %>%
datatable(rownames = NULL,
options = list(scrollX = T,
dom = "t"))
The model table can be written as a CSV or Excel and modified outside of R, or using dplyr as shown below. In this example the model run will apply the linea::dim_rets() function (with a parameter of 0.5, to the “covid” variable.
model_table = model_table %>%
mutate(diminish = if_else(variable == 'covid','10',diminish)) %>%
mutate(decay = if_else(variable == 'covid','.5',decay))
model_table %>%
datatable(rownames = NULL,
options = list(scrollX = T,
dom = "t"))
The model table can be used as an input in the linea::run_model() function. The linea::response_curves() function will display the non-linear relationship captured by the model.
dv = 'ecommerce'
model = run_model(data = data,
dv = dv,
model_table = model_table)
model %>%
response_curves(
x_min = 0,
x_max = 30,
y_min = 0,
y_max = 20000,
interval = 0.01
)
The default transformations cover an extensive range of non-linear relationships, but linea allows users to input their own transformations through the trans_df. The trans_df is effectively a table mapping functions, expressed in R, to their name, and order of execution. In the example below, the function sin(x*a) is added to the default transformations as custom_1. The parameters that can be passed to the transformations need to be expressed as letters starting starting from a, b, c and so on…
trans_df = data.frame(
name = c('hill', 'decay', 'lag', 'ma','custom_1'),
func = c(
'linea::hill_function(x,a,b,c)',
'linea::decay(x,a)',
'linea::lag(x,a)',
'linea::ma(x,a)',
'sin(x*a)'
),
order = 1:5
)
trans_df %>%
datatable(rownames = NULL)
This trans_df can now be used to generate a model table and run models.
model_table = build_model_table(ivs = ivs,
trans_df = trans_df) %>%
mutate(custom_1 = if_else(variable == 'christmas','0.5',''))
model_table %>%
datatable(rownames = NULL)
model = run_model(data = data,
dv = dv,
model_table = model_table,
trans_df = trans_df)
model %>%
response_curves(
x_min = 0,
x_max = 30,
y_min = -20000,
y_max = 20000,
interval = 0.01
)
Similarly to the linea::what_next() function, described in the Additional Features page, linea has functions to run multiple models from specified combinations of variables and transformations:
what_trans()what_combo()To find the right parameters for the non-linear relationship, the function linea::what_trans() can be used to run multiple models with a range of parameters. If parameters are passed for multiple transformations, the function will run models for all combinations. The inputs for this function are:
trans_df) specifying the values of the parametersIn this case, the trans_df can must contain the parameters to be tested for each transformations, separated by a comma:
trans_df = data.frame(
name = c('diminish', 'decay', 'lag', 'ma'),
func = c(
'linea::diminish(x,a)',
'linea::decay(x,a)',
'linea::lag(x,a)',
'linea::ma(x,a)'
),
order = 1:4,
val = c('0.5,10,100,1000,10000','0,0.5,0.8','','')
)
trans_df %>%
datatable(rownames = NULL)
Once the trans_df is ready, it can be passed to the linea::what_trans() function, to return the table of results of all combinations.
model %>%
what_trans(trans_df = trans_df,
variable ='offline_media') %>%
datatable(rownames = NULL)
When modelling, testing one variable at the time can be time consuming and inconclusive. For this reason it is useful to be able to test wider ranges of models that span across different variables and transformations.
Using a simliar set of transformations as before, here we need to specify the possible parameter values for each function, for each variable.
trans_df = data.frame(
name = c('diminish', 'decay', 'hill', 'exp'),
func = c(
'linea::diminish(x,a)',
'linea::decay(x,a)',
"linea::hill_function(x,a,b,c)",
'(x^a)'
),
order = 1:4
) %>%
dplyr::mutate(offline_media = dplyr::if_else(condition = name == 'hill',
'(1,50),(1),(1,100)',
'')) %>%
dplyr::mutate(online_media = dplyr::if_else(condition = name == 'decay',
'.1,.7 ',
'')) %>%
dplyr::mutate(promo = '')
trans_df %>%
datatable(rownames = NULL)
We can now use that to test the specified combinations with linea::what_combo. Due to the complexity of the combinations, across transformations, parameters, and variables, the results are stored in a list of data frames.
combinations = what_combo(model = model,trans_df = trans_df)
names(combinations)
## [1] "results" "trans_parameters"
combinations$results %>%
datatable(rownames = NULL)
combinations$trans_parameters
## $offline_media
## hill_a hill_b hill_c variable
## 1 1 1 1 offline_media
## 2 50 1 1 offline_media
## 3 1 1 100 offline_media
## 4 50 1 100 offline_media
##
## $online_media
## decay_a variable
## 1 0.1 online_media
## 2 0.7 online_media
The Getting Started page is a good place to start learning how to build linear models with linea.
The Additional Features page all other functions of the library.